Skip to main content

Introduction

Use your own audio recordings instead of text-to-speech for complete control over voice, timing, and delivery. Perfect for professional voice-overs, branded audio, or languages not supported by TTS.

Key Features

Custom Voice

Use professional voice talent recordings

Full Control

Control timing, tone, and delivery

Brand Voice

Maintain consistent brand audio identity

Any Language

Use audio in any language or dialect
When to Use: Best for professional voice-overs, branded audio, dialects not supported by TTS, or when you need precise control over delivery.

Quick Start

EndpointPurposeDocumentation
POST /upload/assetUpload audio fileAPI Reference
POST /create_video_from_avatarCreate video with audioAPI Reference
GET /avatar_video/{id}Check video statusAPI Reference

Key Parameters

ParameterTypeRequiredDescription
voice.typestringMust be “audio” when using audio_url
voice.audio_urlstringURL of uploaded audio file
voice.voice_idstringVoice ID (still required)
avatar.avatar_idintegerAvatar ID
avatar.avatar_typeinteger0=Public, 1=Custom
aspect_ratiostringportrait/landscape/square
screen_styleinteger1=Full screen, 2=Split screen, 3=Picture in picture
Important: When using voice.audio_url, set voice.type to “audio” and do NOT include voice.script. The voice.voice_id is still required.

Audio Requirements

Supported Formats:
  • MP3 (recommended)
  • WAV
  • M4A
Specifications:
  • Max size: 20MB
  • Max duration: 10 minutes
  • Recommended bitrate: 192 kbps for music, 128 kbps for voice
  • Sample rate: 44.1 kHz
Quality Tips:
# Optimize audio with FFmpeg
ffmpeg -i input.wav -codec:a libmp3lame -b:a 192k output.mp3

# Normalize audio levels
ffmpeg -i input.mp3 -filter:a loudnorm output.mp3

Code Examples

Step 1: Get signed URL for upload

curl --request POST \
  --url 'https://api.jogg.ai/v2/upload/asset' \
  --header 'x-api-key: YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "filename": "audio.mp3",
    "content_type": "audio/mpeg"
  }'
Response:
{
  "code": 0,
  "msg": "Success",
  "data": {
    "sign_url": "https://storage.jogg.ai/upload/signed-url-here",
    "asset_url": "https://res.jogg.ai/assets/aud_abc123.mp3"
  }
}

Step 2: Upload your file using PUT

curl --request PUT \
  --url 'https://storage.jogg.ai/upload/signed-url-here' \
  --header 'Content-Type: audio/mpeg' \
  --data-binary '@/path/to/audio.mp3'

Step 3: Create video with the audio

curl --request POST \
  --url 'https://api.jogg.ai/v2/create_video_from_avatar' \
  --header 'x-api-key: YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "avatar": {
      "avatar_id": 81,
      "avatar_type": 0
    },
    "voice": {
      "type": "audio",
      "voice_id": "en-US-ChristopherNeural",
      "audio_url": "https://res.jogg.ai/assets/aud_abc123.mp3"
    },
    "aspect_ratio": "portrait",
    "screen_style": 1,
    "caption": false
  }'
Response:
{
  "code": 0,
  "msg": "Success",
  "data": {
    "video_id": "video_123456"
  }
}
Save the asset_url from upload response to use as voice.audio_url. Video length will automatically match audio duration.

Step 4: Check Video Status

Poll to check if video is ready:
curl --request GET \
  --url 'https://api.jogg.ai/v2/avatar_video/video_123456' \
  --header 'x-api-key: YOUR_API_KEY'
Response (Processing):
{
  "code": 0,
  "msg": "Success",
  "data": {
    "video_id": "video_123456",
    "status": "processing",
    "created_at": 1732806631
  }
}
Response (Completed):
{
  "code": 0,
  "msg": "Success",
  "data": {
    "video_id": "video_123456",
    "status": "completed",
    "video_url": "https://res.jogg.ai/videos/video_123456.mp4",
    "cover_url": "https://res.jogg.ai/covers/cover_123456.jpg",
    "created_at": 1732806631
  }
}
Instead of polling, use [Webhooks](/api-reference/v2/API Documentation/WebhookIntegration) to get notified instantly when videos are ready!

Use Case Examples

Use recordings from professional voice talent:
  • Record in professional studio
  • Upload final edited audio
  • Create videos with consistent voice
  • Maintain professional quality
Maintain branded audio across videos:
  • Use company spokesperson voice
  • Record once, use in multiple videos
  • Consistent brand audio identity
  • Scale branded content easily
Use best take from multiple recordings:
  • Record several versions
  • Upload the best performance
  • Edit audio before uploading
  • Perfect timing and delivery
Use audio in languages not supported by TTS:
  • Record in any language or dialect
  • Upload custom audio
  • Create videos with native speakers
  • Reach diverse audiences

Tips for Best Results

Audio Quality:
  • Use clear speech audio (not music) for best lip sync
  • Ensure audio quality is good (44.1 kHz, 128-192 kbps)
  • Trim silence at start/end
  • Normalize audio levels
  • Check audio language matches avatar capabilities